Characterizing Model Errors and Di erences

نویسندگان

  • Stephen D. Bay
  • Michael J. Pazzani
چکیده

A critical component of applying machine learning algorithms is evaluating the performance of the models induced and using the evaluation to guide further development. Traditionally the most common evaluation metric is error or loss, however this provides very little information for the designer to use when constructing a system. We argue that an evaluation method should provide detailed feedback on the performance of an algorithm and that this feedback should be in the language of the problem: Our goal is to characterize model errors or the di erences between models in the feature space. We provide a framework for this that allows di erent algorithms to be used as the discovery engine and we consider two approaches: (1) a classi cation strategy where we use a standard rule learner such as C5; (2) a descriptive paradigm where we use a new discovery algorithm: a contrast set miner. We show that C5 su ers from several problems that make it unsuitable for this task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Practitioner's Guide to Cluster-Robust Inference

We consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year di erences-in-di erences studies with clustering on state. In such settings default standard erro...

متن کامل

Inequality and the Lifecycle1

This paper investigates the sources of cross-sectional di¤erences in consumption, labor supply, wealth and welfare over the lifecycle. I document the existence of rich and informative lifecycle patterns in the joint distribution of wages, hours, consumption and wealth. I then estimate a structural model of precautionary savings with endogenous labor supply and uninsurable wage risk in an attemp...

متن کامل

The generation of fuzzy sets and the~construction of~characterizing functions of~fuzzy data

Measurement results contain different kinds of uncertainty. Besides systematic errors andrandom errors individual measurement results are also subject to another type of uncertainty,so-called emph{fuzziness}. It turns out that special fuzzy subsets of the set of real numbers $RR$are useful to model fuzziness of measurement results. These fuzzy subsets $x^*$ are called emph{fuzzy numbers}. The m...

متن کامل

Efficiency analysis in the presence of uncertainty

In a stochastic decision environment, di¤erences in information can lead rational decision makers facing the same stochastic technology and the same markets to make di¤erent production choices. E¢ ciency and productivity measurement in such a setting can be seriously and systematically biased by the manner in which the stochastic technology is represented. For example, conventional production f...

متن کامل

Cournot equilibrium as emergent behavior in a nonrenewable resource agent-based model

In a simple agent-based model of a small oligopoly nonrenewable natural resource model, the agents, communicating solely through the market price, sometimes exhibit collusion-like behavior, sometimes Cournot-like behavior. The collusion-like behavior is shown to arise when di erences between the agents are small. Conversely, the Cournot-like behavior is shown to result from di erences in produc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000